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(54) Recomrnender system and method 

(57) A system for providing Hem recommendations 
Includes a memory (40), a device (10), responsive to a 
user request, far recording an Horn on a hardcopy me- 
dium, and a processor (12), tar storing ratings or Herns 
and for generating recommendations for now itoms 
based on recommendation criteria, in response lo the 
user raquost, the proccssor(l2) stores an Implicit rating 
for tho requested item In the memory (40), determines 



whether, based on the Implicit rating andtho rocommon- 
dallon criteria, to generate an Item recommendation, 
and II the criteria Tor generating a recommendation Is 
met, goneraiee a reeommondBtlon of a now Item. Tho 
rocommendor system may further store a representa- 
tion of the recorded item In the memoiy. Recommenda- 
tions may be based on Item to Item similarities, Itom to 
usG-rslmllarltlosor user to user similarities. 
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Description 

[0001] This Invention relates to recommender systems, and more particularly, to a rocommonder system associated 
with a hardcopy media device lor generating Implicit ratings. 

s [0002] With tho Increasing uso of electronic madia, tha demise of paper as a communication medium seamed plau- 
sible. However, the promise of the "paperless" office has not yet come to pass.. It Is still true,, for example, that almost 
all Important documents are printed at least once during their life, because paper Is still the most convenient medium 
for reading, annotating and-sharlng documents. The combination prlmer/facslmlle/copler room of a work group Is a 
crossroads through which passes much of the relevant Information embodied In documonts. 

io [0003] Despite the availability of electronic Information, within a workgroup employees often rely on social Interaction 
and happenstance to discover relevant now documonts and share other kinds d( Information. Without face-to-face 
lnteraetlons,.a person rinding a relevant document might not otherwise be aware or a colleague's Interest, ormlght not 
see the link between a particular piece of Information and what he or she perceives as being tho colleague's sot of 
Interests. 

is [0004] Recommender systems. In particular collaborative rocommondor systems, can be part of the solution. They 
help augment the sharing of relevant Information and allow users to declare their Interests. However, until recently, 
workplace recommender systems have required active participation from users. 

[0005] The use of Implicit ratings (ratings deduced from behavior) to compute recommendations has bean proposed 
In tho literature. 

so [0005] Recommender systems which capture Implicit ratings generally provide the benefit of obtaining a greater 
number of ratings than those systems requiring actlvo participation. A recommender system which generates Implicit 
ratings In a work group environment would provide even greater benefits. A system which augments. prlnt/scan/f ax 
services without the need to acquire additional dedicated hardware or the need for users to upload and download files 
would provide honoflts to users. 

25 [D007J Tha invention gathers recommendations without tho actlvo participation of usors, by deducing Implicit recom- 
mendations from a work group's use of a shared recording device, such as a printer, a copier, a scanner or a set of 
printers, copiers or scanners, or some combination thereof. Like other workplace recommendersystems, the Invention 
offers rocommondatlons and search mechanisms that address the problem of shartng relevant Information within a 
work group, but at almost zero additional cost to users. 

so [0008] A system for providing Item recommendations, according to tho Invontlon Includes a memory, a device, re- 
sponsive to a user request, forrecordlng an (torn on a hardcopy medium, and a processor, for storing ratings of Items 
and for generating recommandatlonsfor new Items based on recommendation criteria. In response to the user request, 
the processor stores an Implicit rating forthe requested Item In the memory, determines whether, based on the Implicit 
rating and tho rocommondallon criteria, to generate an Item recommendation, and 11 the crtterlB for generating a rec- 

35 ommendatlon Is mat, generates a recommendation of a new Item, The proceaeorand memory may be co-located with 
the recording device. Alternatively, thB processor and memory may bs located remotely from the recording device, and 
connected to tho recording device via a local Intranet or via the Internet. 

[0009] Tho recommender system may further store a representation of the recorded Item In the memory, which may 
be a representation of the entire recorded Item, a thumbnail Image of the recorded Item, a set of Item attributes or a 

*o characterization of the recorded Item's contont. Tho representation Is used by the recommender system to measure 
or determine Item similarities with other Items or user preferences as stated In a user profile, ir a user profile Is stored 
In the system, the processor, responsive to the user request, updates the user's profile with the Implicit rating. Recom- 
mendations may be based on Item to Item similarities, Item to user similarities orusano user similarities. 
[0010] User to user similarities may bo determined preferably In ono of two mothods. In a first method, the reeom- 

<s mender system characterizes the content of the recorded Item using linguistic tool6, generates a historical linguistic 
user profile for each user comprising a list of torms extracted from user recorded Items and frequency of occurrence 
of such extracted torms, and generates a current linguistic userprorile for each usercomprlslng a list or terms extracted 
rrom user recorded Iiem3 with terms being weighted by a damping coefficient, er°', where t = today - tlmestamp of 
association or the recorded Item with tha user and a Is the damping coefficient. If two users hBve similar linguistic 

so profiles. Items recorded by one user may be recommended to tho other user. 

[0011J in tho socond mothod, the recommender system determines an action based user similarity rating by corre- 
lating the number or U6er provided Items Inthe user's proflleto the total number of recorded Item representations stored 
In tho memory. If two users havo similar recording and/or rating hlstorioa, then an Item or documont rocordod by one 
or them In the ruture Is likely to Interest the other person as well. 

w [0012] The Invention extends the office printer (or a set of primers) In such a way that It becomes a recommendor 
system. The action of recording (e.g., printing, scanning, copying) Is taken to be an Implicit declaration of Interest. 
Usors retain tho capability tD actively recommend documents to the system, If a separate Input Interface Is provided 
to receive explicit rating Input. The key advantages of a recommendersystem remain: personalized recommendations, 
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knowledge sharing, reputation mechanisms, workgroup/community memory, and search and browsing functions, and 
with tho advantage! of a greater number ol Implicit ratings at no additional cost to the work group. 
[0013] The recommendation system of the Invention provides several functionalities. Users can automatically receive 
notification of documonto similar to thoso she/ho has recently or historically printed (using a document-user similarity 

5 measure). Users can automatically receive recommendations or documents printed by users with similar preforoncos 
(using a user-user similarity measure). Users can find documents similar lo a given document (using document-doc- 
ument similarity measure). Users can find other readers of a given document (using an optional search functionality). 
Users can find other roadars of.docurnentsslrnllarto a glvon document (using a document-document similarity). Users 
can receive random document onhe day (optionally). 

to [0014.] Upon printing a document, users may receive one or more of thoso functionalities electronically via a user 
display or Interface. Alternatively, users may receive recommendations printed on a printer cover sheet (which can be 
configured by an administrators automatically display one or more notifications of the different kinds described above) 
as described In copending US Paient Application No. 09/746813, which Is Incorporated herein by relerence. 
[0015] In addition to providing rocommondor sorvlcos to users of recording devices such bs printers, facsimile ma- 

r* chines and scanners, other services may also be provided without the need to buy and Install additional software or 
appliance. A knowledge management system, which provides such additional services Includes a device, responsive 
to a usorroquost, for recording a requested Item on a hardcopy medium, and a knowledge managomontsorvlco located 
on a distributed network remote from the device forprovldlng services associated with Items In the system. Theaervlce 
Includes a repository and a processor, wherein for each Item requested to be recorded, the knowledge management 

so service stores an electronic copy of tho recorded Item In the repository, gonoratos and stores a record of the user 
rcquost with tho requested Item In the repository and associates a service with The requested Horn. Tho systom also 
Includes an Input device for requesting services associated with Items on the system. The Input device and recording 
device may be the same davlco If the recording dovlco Is configured to enable Input raquosts to the system. 
[0016] The knowledge management system seamlessly captures the stream of recorded (e.g., printed and faxed 

6 and scanned) documents. When a physical printer, at a user location. Is associated with the knowledge management 
system, the knowledge management services, located remotely, a virtual printer becomes associated with a physical 
ono. The virtual printer Is available to augment the service of the physical printer. The user location and knowledge 
managament services may be connected via an Intranet or via the Internet. When printing, the user has the option ot 
selecilng a physical printer not on tho systom or a "virtual printer on tho systom. By printing on the system printer, the 

30 usor enables the storage service, which keeps a print-ready version otthe document, e.g., a PoslScrlptorPDF rile, In 
tho user's personal print memory. The availability or the printed documents In the repository also creates a workgroup 
memory of relevant (because printed) documents on top of which It Is possible to provide additional personal and 
collaborative services, without the requirement to havo storage and processing at tho usor location. 
[0017] Many different types of services nowbecome available to the local user. Recorded documents may be Indexed 

3s for searching and fast retrieval. Contextual memory can be used to supportthe search (e.g., "I remember I printed that 
document last woeIC). Related documents (the friends documents) may be retrieved along with the names of users 
who recorded thorn to support awareness of related activities and facilitate expertise location. Documents may be 
clustered and categorized to support self-awareness of activities and shifts In Individual and group Interests. MulrJple 
versions of the same document {twin documents) may be recognlzod. allowing automatic vorslonlng even when mulll- 

40 authoring occurs. By extension , the system can trlggeran alertto earlier readers or authors when a more recent version 
of a document Is printed. 

[001 8] When tho knowledge management system Is connoctod via the Internet, a Web sorvor that allows users to 
access the services from tholr Web browser may bo used. DocuShare, for example, may bo used to organize the 
documont repository both for the siorage and the access of the printed documents, and for providing the associated 
43 services. 

[0019] An Important foatura or the knowledge msnagomont system Is a recording archive, callod a print momory. 
The system Intercepts the prlnt/ecan/f ax requestsfrom local users and records the documents In a digital archive. More 
precl8©ly> a local printer becomes-a virtual prlntor when the system croatos a print quouo for It In a remote system 
server. The users then print through tho system sorver rathsrthan directly toward tho printer. The system thus has not 

so only the opportunity to record the prlntjob but also to provide additional Information or service by augmenting the print 
Job. In particular, by replacing the print banner by a FlowPon™ form created on the fly specifically for the current print 
Job (as described In copending US Application No. 09/746913) and tho requesting usor, tho system provides highly 
valuable services, pertaining either to the knowledge sharing or workflow domains. A key advantage to the system 
resides In the non-dlsrupilve nature of tho service, as It gets fed and actlvatod by tho user's print/recording actions, 

55 [0020] Although The knowledge management system may provide services In place. I.e., an the local network, It Is 
additionally beneficial to provide these services via tho Internet In this case a third party may provldo tho storage and 
associated document services to the small or home office relieving them from the Installation and administration costs 
of a dedicated appliance or software. For such an Intomot based system, for oxamplo, usors may register with an 
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Internet web site of the knowledge managementservice provider. Users may register one orseveral olihelrown printers 
(facsimile machlnos, scannors or multi-function dovlcos] with the 3en/lce provider, specifying tholr notwork address 
and preferred print protocol. The Internet knowledge management web site provides the customer with the augmented 
recording queues corresponding to the registered dovleos, Uoors may roquost sorvleos through any convenient Input 
s device. 

[0021] Figure 1 Is a block diagram of a system for providing recommendations according lo the Invention: and, 
[0022] Flgures2 and 3 are block diagrams of a dlstrfouted knowledge managementsystem according to the Invention. 
[0023] Referring to Figure 1, a system for providing recommendations 100 Includas a dovlco 10 lor recording a 
document (an Item) on a hardcopy medium such as paper, a processor 12 and a memory or repository 40. Memory 

10 40 Includes regions for storing document representations 42, ratings 44, user profiles 46 and recommendation criteria 
43. When a user 50 submits a request to record (e.g., print) a document to device 10, processor 12 stores an Implicit 
rating of the requested document In ratings 44. If user 50 has a user profile stored In user profiles 46, processor 12 
updates user 50's user profile to Indicate an Implicit rating of the requested document. Processor 12 generates a 
recommendation to provide to user 50 using whatever rocommondatlon criteria has boon stored In recommendation 

is criteria 48. A recommendation may be generated based on adatermlnatlonofdocument-documentslmllarlty (similarity 
of the requested document to other documents In the recommender system), a determination of document io user 
similarity (documonts similar to thoso tho usor has printed) or a determination or userto usor similarity (documents 
printed by other users having a similar user profile). After processor 1 2 arrives at a recommendation of one or more 
other documents to provide to user 50, these recommendations may be provided to user 50 In different ways. 
[0O241 As described In copending, coasslgned US Patent Application No. 09/748913, a recommendation may be 
provldod to usor 50 by printing It on tho printer output cover sheet 20 which precedes the prtntad document 30. Alter- 
natively, user 50 may access electronic Interlace 60 and read the recommendations on a display associated with 
Interface 60. Electronic Interface 60 may be.forexamplo, a computer, e personal digital appliance (PDA) a coll phono 
with internet email ore networked workstation. Electronic Interface may be connected to system 100 directly, wlreleaely, 

25 «|ra an Intranet connection or via an Internet connection, ir recommender system 100 Includes optional search func- 
tionality, user 50 may access the search recommender system 100 for documents of Interest based on whatever Input 
criteria user 50 submits. 

[0025] The Invention, Inthla embodiment, can be considered as extending a printer (or some other type recording 
device) to a recommendersystem. Tho combination of procossor and momory operates functionally to provide a module 
so to compute and store representations of the printed documents of a work group; a module to measure similarity among 
printed documents; and a module to measure similarity of Interest among people. 

[0026] Not all users may wish to participate In the recommendersystem, so the- workgroup may be set up to enable 
users to select printing a documont In work group modo In which the user's set of printing will bo Input Into tho system 
as an Implicit rating Tor the requested document. If the user elects to print In personal mode, no such rating will be 

3s stored and no recommendations will be provided. When In work group modo, tho recommendation system adds an 
Implicit positive rating of the doeumentto the user's profile. The recommendersystem may be sat up with many different 
recommendation criteria. For axamplo, tho work group may establish a recommendation criteria which requires tho 
recommender system lo extract the context from the transmitted print Job and Information on the user as well as on 
the document, either on-line or off-line. The recommendation systom may also storo a representation of the document 

->o in a repository (olthor local or remote); this representation being possibly the document lleelf or a set of attributes (title, 
references, and othermetadaraaboutthe document) along with, for example, a characterization of lis content computed 
U3lng linguistic tools. A documont similarity modulo may compute document similarities (again,, orthor on-llno or off- 
line), on the basis of tho stored representation of the printed documents. An Interest similarity module may correlate 
interests of users on the basis or how much they tend to print similar documents and computes similarities between 

« documents and user Interests (on-line or off-line). 

[0027] Once tha rocommondor systom has boon In placo for a period or time storing ratings and gonoratlng recom- 
mendations for the work group, other typical features of recommender systems may be provided. For example, the 
recommender system may create a map of what has boon prlntod In a work group. This Information can then be 
browsed or searched from an electronic Interface 80 to the systom 100. In IteDlf this collodion has value as a corporate 

so or workgroup memory. The methods Implemented for Indexing and browsing such a collection as described In EP-A- 
1 050B32are applicable here as well. 

[0028] Device 1 0 maybe a printer, copier, scanner or multi-function dovlco (MFD) A MFD Is a digital device that can 
scan, storo tho scanned Item In memory and print the scanned Item. When an Item Is presented for printing to a MFD, 
the MFD can store an Image of the Item printed. This Image can be storod locally In tho MFD's memory, In the recom- 
S5 mender services memory or In a document repository. If stored In a document repository, the document repository may 
be located locally or remotely and accosslblo by a notwork. Storing a record or Image or each Itom prlntod or recorded 
onablos the recommender system to generate recommendations and to retain a history of Items Implicitly of Interest 
to the workgroup. It also enables users to access tho storod ttoms.Thls may bo ospoclaliy advantageous If a transitory 



4 



2008^11 B 26 B (*) 14:38 Nil Patent Firm (FAX) 06 4806 7531 P. 01 0/024 



EP 1217 554 A2 

Item such as a Web page downloaded from the Internet Is printed. 

[0029] The recommender system 1 00 can be thought of as having a system architecture with three layors. A bottom 
layor of the system architecture consists of storage, typically Implemented using a database to store document repre- 
sentations and user prorlles. Methods for representing documents and users arc discussed In moro detail bolow. A 
s middle layer consists of a set of services Implemented via several modules thai respectively are responsible for cal- 
culating and updating: (1) document-document similarity; (2) user-user similarity; and (3) document-user similarity. 
Finally, the top layer consists of tho usor Intorfaco and access to the services. 

[0030] - The system uses a networked printer or other recording device to collect Implicit ratings on documonts from 
usors In a non-Intrusive way. Access to tho services Is also available directly from the printer, or through standard 
to electronic Interfaces,. such as via a browser, e-mall Interface or document managomont sy3tom Interface. Below are 
described a set of methods for calculating three different kinds of similarity measures end also describing posslbjo 
Implementation of usor Interfaces fortho system. 

[0031] Documont representations and documont-documont similarity measure. Each time a document passes 
through the recommender system 1 00, the system checks to see If the document |3 already known Id the system. First 
w a document characteristic Is computed as described below. If the characteristic matches one already stored In the 
system, then the document Is assumed to be "known" and no further stops are taken. Otherwise, the document Is 
assigned a document Identification (docld) andthedocld, characteristic and atlmesiampare stored (e.g. Inadatnbaso 
table -32 for document characteristics). 

[0032] Each document contained In the system memory maybe characterized using a linguistic method. Othermeth- 
z° ods of document characterization may be used. Documents can then be compared against oach other to compute 
document-document similarities using their respective characteristics. 

[0033] One method for computing documont characteristics and determining document-document similarities In- 
cludes the following. First, If the language used for the source text Is noi already known then the probable language 
Is Identified (see G. Grefenstette, "Comparing Two Language Identification Schemes", In Proceedings ot The 3 rd 
ss International Conference on the Statistical Analysis of Textual Data, JADT95, December 1 995, Rome, Italy). Then tho 
text words are tokenlzed (see G. Grefenstette and P. Tapanalnen, "What Is a Word, What Is a Sentence? Problems of 
Tokenlzailon", In 3 rd International Conference on Computer Lexicography and Text Research, COMPLEX'94, July 
1 994, Budapest, Hungary), Tokenlzed words are then lookod-up In a morphological lexicon and the most probable part 
of speech tag for each word Is calculated (see A. Schiller, 'Multilingual Part-of-Spocch Tagging and Noun Phrase Mark- 
ka up", In the 1 5th European Conference on Grammarand Lexicon of Romance Languages", September 1 996, University 
of Muhlch). This Is used to produce a normalized form of each word. As this process occurs, duplicate words are 
discarded and a count of the number of occurrences of oach word Is kept to enable the calculation of weights ba6ed 
on word frequency. Finally, a llsi or stop words (see G. Barton, "The SMART Retrieval system: Experiments In automatic 
document processing", Prentlce-Hall, 1971) for the source language Is used to discard frequent words that are not 
3* used for classifying the text, such as conjunctions and prepositions. 

[0034] As a result of this process each document has associated with It a list of keywords with respective frequencies. 
These lists of frequencies can ba used to calculate the similarity between two documenls using the weighted Jaccard 
algorithm (see G. Grefenstette, "Explorations In Automatic Thesaurus Discovery", Kluwer Academic Press, 1994). 
Keywords are rirst given weights Inversely proportional to their frequency In the corpus so that loss frequent words, 
■*o which are better discriminators, have a higher weight, 

[0035J Howovor, this approach may not always bo optimal since isolated keywords are not necessarily the beat 
Indicators of the content of a document. For example, a document containing the phrase "science fiction" would have 
some correlation with a document containing "computer science". I n addition, available stop word lists will not contain 
words that have little discriminating power- for example "person". To address these concerns there are several refine- 
rs montsthat can bo made to tho above approach. First, Instead of using single keywords the system can Identify noun 
phrases and use only these. To achieve better discrimination the system can be set to only use noun phrases consisting 
of a specified minimum number of words - foroxamplo, "Information retrieval" Is likely to discriminate between docu- 
ments better than slmpry using "Information" and "retrieval" as Isolated keywords. Socondly, additional methods of 
discarding keywords can boused. Corpora like the British national Corpus provide a list Df words, thoir part of speech, 
» their frequency and the number of documents In the corpus that they appear In. Using this Information, It Is possible 
to determine whether a word Is occurring with above average iraquoncy In a speclflctext compared with how frequently 
It appears on average, thereby enabling only keywords of above average frequency (which are then presumably more 
closely related to the subject domain of the text) to be used In the similarity measure. 

[0036] The similarity between two documonts Is then calculated as tho sum of tho weights of all koywords (or phrases) 
55 two documonts havo In common divided by the sum of the weights of all koywords associated with the two documents 
X and Y (Equation 1 below). 



s 



2008^11 R 26 B C*) 14:38 Nil Patent Firm (FAX)06 4806 7531 P. 011/024 



EP 1217 554 A2 




Equation 1 
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10 [0037] The document-document similarity measure can be used to Identity: other documents similar to a given doc- 
ument (e.g., the one being printed by a user): and colleagues who have printed documents slmllarto aglven document 
(and thus are more llkoly to bo exploring similar topics). Two documents are doflnod as being similar when their doc- 
ument-document similarity moasuro oxcoods a glvon threshold. 

[0038] User profiles and user-user and user-document similarity measures. Each lime a userprlnts/scans/coples a 

19 document, the system stores a record of the user's Identification (userld), the docld, the print/scan/copy action and a 
tlmestamp. If tho oy3tom also has access to "read" events tor documents (via electronic monitoring of user Interfaces, 
for example), then the system has the capability to store records or userld, docld, the read action, time spent reading 
and tlmestamp for those events. Explicit ratings and comments provided by tho user ihrough either a primer cover 
sheet (as described In USSN 09/746913) paper Interface or via an electronic user Interface may also be siorBd, when 

20 available. In this way, user profiles of both Implicit ratings ("print" and/or "read" actions for example) and explicit ratings 
(numerical scores and commems for example) may be constructed Incrementally over time. 

[0039] User profiles can also Include term r frequency Il3t3 extracted from documents associated with the user (I.e., 
documents printed, read or otherwise recommended by a user). One advantageous method maintains two such term- 
frequency Nats for each user. The first list Is extracted from the set of all documents associated with a user. This list Is 

25 called the user's historical linguistic user profile. Forthe second list, terms from documents more recently associated 
with the usor are weighted more heavily than terms from documents whose association Is further In tho past This can 
be achieved, for example, by multiplying document term weights by a damping coefficient, «r°"; where t ~ today - 
tlmestamp of association of document with tho usor and a Is tho damping coefficient. This second list Is referred to as 
the users "current" linguistic profile. 

30 [0040] Various methods can be used to determine a user-user similarity. For example In an action-based user-user 
similarity, this measure of similarity calculates the correlation of users' prlnr/rate anions over the set of, documents 
known by the recommendor system. In a linguistic user-user similarity, this measure of similarity compares The overlap 
In users' linguistic prorlles, Just as the document-document similarity measure described above computes the overlap 
In documents' characteristics. Tho overlap batwoon a usaf a linguistic profllo and e document characteristic may also 

as be determined. 

[0041] Action-based user-user similarity. This approach Is based on the assumption that If two users have similar 
printing and/or rating histories then a document acted on by one of them In the future le likely to Imareei the other 
person as well. In essence, this approach Is an extension of tho passlvo collaborative filtering algorithms used by many 
existing recommender systems to take Into account Implicit ratings. 
40 [0042] The system builds up a correlation vector for each pair of users, x and y, {prtnt^ rare^, For print actions 
(othorlmpllcltrocommondlng actions such as reading, scanning, copying may be added as woll) tho correlation botwoen 
two users Is the relative frequency In which the two users perform the same action on the same documents, given by 
Equation 2: 



That Is, the print correlation for two usors X and Y is the numbor of documents in common that both usors have printed, 
divided by tho total sum nf documents prlnlcd by both users. 

[0043] The raUng correlation between two users can be calculated statistically, for example using the Pearson Algo- 
rlthm described by P. Resnlck, N. lacovou, M. Suchak, P. Bergstrom & J. Rledl: "GroupLens: An Open Architecture Tor 
Collaborative Filtering of Notnews". In Proceeding of CSCW'94, October 22-26, Chapel HIM, NC, 1994 and shown In 
Equation 3. X, and V, represent the ratings of user X and Y respectively for Item /. The algorlLhm yields values that 
range from -1 (when Xand Ytend to disagree), to 0 (when Xend Vs actions are uncorrected) and to 1 (when X'and 
Ylond to agroo perfectly). Note that the only items taken into account for these computations arc the ones that both 
Xand Vhavc rated. 



Equation 2 
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Cov[X,Y) 
a X°Y 
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where =1 \_rxv [ 1 



Equation 3 



[0044] The numerical carrelatlon calculated forXand Vl3 then tBkon to be a weighted sum of the Individual vectorial 
components. The weights assigned to the different components (I.e. print/rate) are parameters to the recommonder 
20 system and may also he a function of the number of documents In common for each of the different actions. For 
example, the system might take Into account the rating/print correlation only when the two users have rated/printed at 
least 10 documents In common. 

[00451 Note that this approach may suffer from the cold start problem (no common action Initially) and the sperslty 
problem (two users In general act upon different sets of documents). As a result, this approach may yield significantly 

£5 useful similarity measures only artertne recommender system 100 has collected a good number of user anions. How- 
ever, by taking Into account many different user actions the recommender system should be able to overcome the cold- 
start problem more rapidly than traditional rating-based recommender systems. In fact, Initially, tho system can be 
configured to convolve all user actions Into one kind so as to make the most of a sparse set ol actions. Thus two users 
who In general-act upon the same documents are Initially taken lo be well correlated. As iho system collects more 

30 actions forthesg two users, the more differentiated vectorial approach takes over. 

[0046] Linguistic user-user and user-document similarities. In addition to computing correlations between users 
based on similarity of actions, It Is also possible to correlate- users by directly comparing their profllos to chock for 
degree of overlap. User-user linguistic similarity Is determined In the same way as document-document similarity. The 
only difference Is that user-user similarity omploys vectors that represent users' Interests (either historical or current 

as interests). These.vectora are then regularly updatedr either each time a document is printed (In work group mode) or 
periodically for all users. Similarly, the overlap In a document characteristic and a user profile can be determined to 
obtain the document-user similarity measure. 

[0047] The recommender system offers a number of services, as discussed earlier, based on Its store of document 
characteristics and user profiles and Its calculations of user-user, document-user and documont-documentslmllaritles. 

*o The work group memory ihus preserved by the recommender system can be extended by combining It with a repository 
of documonts browsed In work group mode. Services on top of e repository or browsod documents has been described 
In EP-A-1 050832. By combining the two repositories, Improved services can be ortered via the Iwd different Interfaces, 
paper interfaces on tho one hand and electronic user Interfaces on tho other. Moreover, the two main services of the 
recommender system may be made available to Web Memory: extraction of documents slmllarto a template, by using 

•** ■ tho covor sheet as Interface and personalized notification of documents which arc llkoly of Interest. 

[0048] Browsing (l.e., reading) actions can then be added as another kind of action to the recommondor systom. In 
fact, the actions of browsing and printing a documont aro different dogroos of declaration of quallty.and relevance of 
a document. When documents are recommended or retrieved, the user Interface can Indleato both the frequency of 
browsing and of printing In the user population. 

so [0049] The documonts stored In the repository because of the read and print actions of the usercan then be searched, 
to see who has similar Interests and what documents are relevant to a topic. Because several people could have read 
and printed the 6ame document, this Information could be used to rank the value or a certain document. The two actions 
can be referred to as hits and bo distinguished as read hit and prim hit. On the basis of the linguistic representation of 
adocumontand on tho basis or the numberor hits associated with It, thesorvlco can provldotwo views on the document, 

& In order to help the navigation of the results. The content based view orders the list of documents on the basis of their 
degree ol similarity with tho U3er requests, whllo also showing tho hits and their qualitative value (read or print). The 
hit view orders the results by putting on top of tho list the documonts, which obtained higher hits measures, while still 
showing the strength of similarity with the users request. 
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[0050] Recommendation of similar documants triggered by a primed document. Each time a document Is printed, 
the document Itself can be used as a template against which to measure forslmllarlty In the repository. Tho conlont of 
the primed document may be converted to a linguistic representation that Is rhen used to measure for similarity In the 
way described above. The list of results Is then presorted In one of the two vlows (content or hit) oxplalnod above. 
s The result can be delivered to the user on the print cover sheet of the document. The cover sheet Itseir can be an 
active tokon supporting subsoquont retrieval of tho suggested documents directly from the printer ae described In 
USSN 09/746913. 

[0051] . .Recommendation of documents of Interest on the basis of similarity of Interest The print mornory cauld also 
advise purely on the basis of the user-user similarities, without relying on a sample document, like In the previous case. 

10 The previous mode could be defined as reactive and the mode presentod horo, proactive. While forthe reactive mode 
a convenient output could be the cover sheet orthe printed document taken as template, In the case or the proactlvo 
set of recommendation probably other means would ba more appropriate. For example, users could receive by e-mail 
a compilad list of documents which the group (atthe chosen organizational granularity degree of preference) considers 
or Interest (either historically or recently). 

»* [0052] In another embodiment of the Invention, In addition to capturing to providing reeommender services to users 
of recording devices, other document related services may also ba provided. Referring to Figs. 2 and 3, a distributed 
knowledge management system 200 is shown. System 200 includes knowledge managamont service provider 210, 
which Is located remotely from users 50. In this example, users 50 access the services provided by service provider 
210 via the Internet 150. Service provider 21 0 Includes a repository and a processor. Various types of services may 

zo bo provided by provider 210 (Including tho recommendor services doscrlbod above). 

[00531 To access the services, user 50 prints a document 120 through a local printer 112. which Is connected to 
service provlder21 0. The service provider 21 0 records the document 120 In the digital archive It hosts forthe userSO. 
It also process me pnnt Job In order to provide the various document services. This procosslng may Include storing a 
copy of the printed document 120, extracting text, content Indexing and other services. Service provider 21 0 then 

*s transmits the print Job to the user's printer 112 where the printed document Is produced. Service provider 21 0 may be 
augmerrtod with additional storage 220 and date managomont tools such as FlowPort 240. 
[0054] The user SO picks up the print Job from his own printer 112. The user 50 can access browse, search and any 
other service via The web site 130 of the service provider 210. The customer can also access to services via a paper- 
based userlnterface 110 (such as FlowPort forms) by scanning thefllled-ln form 122 on a FlowPort onablod MFD 110. 

so As described In USSN 09/745913, the printed document may be preceded by a special banner page which may be a 
FlowPortform allowing the userto conduct further Interactions with the service provider on paper. Becausethe cover 
sheet 122 Is uniquely Identified, Itconstltuies a pointer to the stored document In the digital archive. 
[0Q55I Distributed system 200 olfers knowledge management services lo users as woll as othor advantages. System 
200 cuts the acquisition cost of a dedicated device or software. Administration costs, In particular those Induced by a 

■55 safe archiving of largo volume or data, are cut; archiving Is now on a romoto repository. The customer benefits from 
an up-to-date service. Geographically spread offices can share a single archive seamlessly. Several different purposes 
may be accomplished at the same time with system 200. Several offices under same ownership may share the samo 
archive; several offices under different respective ownerships may share an archive for collaboration purposes. 
[0056] System 200 requires that users give the service provider network accoss to their printer. This may potentially 

«o require the configuration of a firewall 140, and raises security consideration with respectio a potential Improper use 
of the printer by mallclouc poroons. This can bo solved by sotting up occosc control forthe usage of the printer, which 
Is well known to those skilled In the art or network connectivity. Additionally, security Issues may be considered. Tho 
client and server authentication (proving that the user Is who he/she claims to bo), data Integrity (ensuring content 
remains unaltered) and privacy (keeping content private) can be properly handled over the Internet U6lng a secure 
transportprotocol such asSocuro HTTP, ortatho I ntomot Print Protocol, HowovDr, the service provider has full access 
to the content Privacy and content Integrity can technologically noi be guaranteed and the customer must trusttho 
service provider (as It does with banks, public notary and so on). While print Jobs are often of large size and the Internet 
bandwidth sometimes constitutes a bottleneck; however, these Issues are bellovod to bo minor bocauso of ongoing 
bandwidth impravamonts. 

so [00571 Examples of services that can be provided by system 200 (by capturing the document recording stream) 
Include the following. Textual content can be extracted from captured documents and Indexed. The current OCR tech- 
nology Is close to a character cognition rate or 1 00% for machine-generated characters. PostScrlpt-to-text converters 
are an alternative, which Is less efficient. A permanent archival of printed documents, with associated search and 
visualization sorvlcos may bo croatod. This archive supports tho contextual memory, e.g. "I remember I printed this 

ss dDcumentfew weeks ago on that color printer." Automatic clustering and categorizing of documents provide a hierar- 
chical view of the stored documents. 

[0058] From a network perspective, It Is assumed that the usor has a local area network thac Is linked to the Internet 
by a router, In ordorto allow the service provider 210 to transmit a print Job to tho user's printer. This configuration 
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Implies also the usage of a firewall 140. While very frequent for offices, even small ones, this configuration Is Ies6 
common In home offices, but home office networks may oo mom popular In the futuro. To mlnlmlzo the time delay 
caused by sending the print document to service provider 21 0 which processes the document before the document Is 
rclonsod, an altornatl vo approach Is to send a copy of tho print Job to the service provider (a carbon copy to the service 

s provider ratherthan a print through It). This can be accomplished by modltylng the print spooler. In this embodiment, 
the service provider can extract the data needed from the document while the local printer Is generating the print Job. 
H owever, In th Is alternative embodiment, the service provider may net be abl e to provide I nf ormatlon vl a th e print cover 
sheet. This embodiment eliminates the need for granting prlntor accoss to triosorvlco provldor. roducos by a factor of 
two the size of data transmitted over the Internet and does not6low down the print Urns. 

io [0059] The sorvlce provldor may provide an XML Interface through which document content and user requests can 
be passed between the user Interface and the server. Using an XML Interface offers several advantages In that a 
number of user Interfaces are available. Users can access the service through a paper Interface. If a FlowPort form Is 
produced every time a document Is printed, the user can take the FlowPortform to the Input device to request services. 
This interfaca supports hand-writton noto taking, classification of the document and sharing of the document by means 

w of the document token. Users can access the service provider via a DocuShare account DocuShare offers an equiv- 
alent of a Web Interface. Users can access the service provldervla a wireless connections such 86 through a personal 
digital assistant [PDA), Services (browse and reprint) are available from tho PalmPort Intorfaca. PnlmPort supports 
Infrared based browsing and printing on a multifunctional device. User can access the service provlderthrough a Digital 
Filing Cabinet (DFC). DFC Is a user Interface developed In Cambridge. U.K. to access hlglvlevel document runctlons 

2t> from a multifunctional device (M FD). System sorvlcos are available from the DFC Interface to demonstrate a different 
way to retrieve documents from ihe MFD (I.e., exploiting the contextual memory associated lo the print action). 
[0080] The knowledge management system seamlessly captures (workgroup and organizational) recording actions 
to take boneflt from tho common repository that Is created In through those actions. One benefit of tho system Is that 
It can provide ameana of retrieving Information via clustering and categorization. After a recorded document Is stored 
and analyzed, a similarity metric Is available, based on term weighting- on the baseot average frequency on the Web, 
This metric provides an Infrastructure for building a number of services: detection of ancestors (versions), children 
(portions) and friend (related) documents; detection of clusters of Interests, both to support activity analysis and to 
support, Information exploration activities; community mining, discriminating between communities of practice (with a 
high dogroo of print overlap) and communities of Interest (with a high degree of topic ovorlap). 



Claims 

1. A system for providing Item recommendations, comprising: 

a memory; 

a dovlco, responsive to a user request, for recording an Horn on a hardcopy medium; 
a processor, for storing ratings of Hems and for generating recommendations for new Items based on recom- 
mendation criteria; 

wherein, responsive to the user request, the processor stores an Implicit ratlngforthe requested Item In the 
momory. determines whether, based on the Implicit rating and the rocommondatlon criteria, to gencraio an Item 
rocDmmandatlon, and If tho crltorla for generating a recommendation Is mot, gonoratDS a rocommondatlon of a 
new Item. 

4S 

2. Tho systom of claim 1 , whoreln the memory stores user prorlles for usors of the system, wherein each usar profile 
Includes a set of user preferences pertaining to Items and 

wherein the processor, responsive to tho usor roquost, updates the user's profile with tho Implicit rating. 

so 3. The system of claim 1 or claim 2, wherein the processor further stores a representation or the recorded Item In 
memory and determines an Hem similarity for the recorded Item. 

4. The system of claim 3, wherein trie Item similarity comprises an Item to Item similarity determined by comparing 
the stored representation of the rocordod Itom with tho stored representations of other recorded Items stored In 
35 the memory. 

s. The system of claim 4, wherein the processor characterizes content of the recorded Item using linguistic tools and 
wherein the processor dotormlnos tho Item to Itom similarity botwoon two rocordod Items by calculating a sum of 
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weights of keywords In common divided by a sum orwelghts of all keywords associated with the two recorded Items. 

Theayetem of any of claims 1 to 3, wherein theproceeeorcharacterlzee content orthe recorded Item using linguistic 
tools and whoroln tho processor generates a historical linguistic usorprofllo for each usor comprising a list of terms 
extracted from user recorded Items and frequency of occurrence of such extracted terms and wherein the processor 
generates a current linguistic user profile for each user comprising a list of terms extracted from user recorded 
Items with terms being weighted by a damping coefficient, e"*', where r = today- Tlm&stamp of association of the 
recorded Itom with the usor and ct Is a damping coofflclont. 

The system of any of claims 1 to 3. wherein the processor determines an action based userslmllarlty by correlating 
the number of user Implicit ratings in the user's profile to the total number of recorded Item Implicit ratings stored 
In the memory. 

The system of any of claims 1 to 3, wherein the processorcharacterlzes content of the recorded Item using linguistic 
tools, wherein the processor generates a linguistic user profile for each user comprising a list of terms extracted 
from user recorded Items and frequency of occurrence of such extracted terms, and wherein the processor deter- 
mines an overlap botwoon a user's linguistic profile and a recorded Item's linguistic content characterization. 

A method for generating recommendations, comprising: 

providing a user request for recording an Item on a hardcopy medium; 
storing an Implicit rating of the requested Item; 

determining whothor, bosod on tho Implicit rating and recommendation criteria, to ganorato en Item recom- 



If the criteria for generating a recommendation Is met, generating a recommendation of a nBW Item. 

10. The mothod of claim 9, wheralntho recording Is selected from the functions of printing, scanning and copying. 

11. The method of claim 9 or claim 10, further comprising storing usor profilos for usors providing user requests, 
wherein each user prorlle Includes a set o! user preferences pertaining to Hems and further comprising, updating 
tho requesting user's profile with the Implicit rating. 

12. Tha mothod of any of claims 9 to 11 , further comprising storing a representation ol the recorded Itom In momory 
and determining an Item similarity lor the recorded Item. 

13. Tho mothod of any of claims 9 to 11 , further comprising storing a representation of the recorded Item In memory 
and comparing tho storod representation of tho recorded Itom with stored representations ol other rocordod Items. 

14. Tha system of any of claims 9 to 13, further comprising dotarmlnlnga usortD usor similarity for the user by com- 
paring the user's profile with the other stored user profiles. 

15. The mothod of any of claims 9 to 14, furthorcomprislng calculating an Item similarities rating between two rocordod 
Items by calculating a sum of weights of keywords In common divided by a sum of weights of all koywords asso- 
ciated with the two recorded Hems. 

16. Tho mothod of any ol claims 9 to 15, furthorcomprislng: 

characterizing content of the recorded Item using linguistic tools; 

gsnoratlng a historical linguistic usar profile for sach usor comprising a list of torms axtractsd from user re- 
cordad Items and frequency of occurrence of such extracted terms; and 

generating a current linguistic user profile for each user comprising a list of terms extracted from user recorded 
Items with terms bolng wolghtod by a damping coofflclont, o^, whore tr? today- tlmostamp of association of 
tha recorded Item with the user and a Is the damping coefficient. 

17. The method of any of claims 9 to 16, furthorcomprislng: 

determining an action based user similarity rating by correlating the number of user provided Hems In the user's 
profile to tho total numbor of rocordod Item representations stored In tho momory. 
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18, Tho method of any of claims 9 to 15, further comprising: 

characterizing content of the recorded Item using linguistic tools; 

generating a linguistic uaor prof Net for aach ucor comprising a list of tarms extracted from user recorded Items 
and frequency of occurrence of such extracted terma; and 

determining an overlap between a user's linguistic profile and a recorded Item's linguistic content characteri- 
zation. 

19. A computer program product storing a computer program for performing a method according to any ol claims 9 to 
18, 
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